An Empirical Investigation of Discounting in Cross-Domain Language Models
نویسندگان
چکیده
We investigate the empirical behavior of ngram discounts within and across domains. When a language model is trained and evaluated on two corpora from exactly the same domain, discounts are roughly constant, matching the assumptions of modified Kneser-Ney LMs. However, when training and test corpora diverge, the empirical discount grows essentially as a linear function of the n-gram count. We adapt a Kneser-Ney language model to incorporate such growing discounts, resulting in perplexity improvements over modified Kneser-Ney and Jelinek-Mercer baselines.
منابع مشابه
Nominalization in Academic Writing: A Cross-disciplinary Investigation of Physics and Applied Linguistics Empirical Research Articles
The present study aimed to explore how nominalization is manifested in a sample of Physics and Applied Linguistics research articles (RAs), representing hard and soft sciences respectively. To this end, 60 RAs from discipline-related professional journals were randomly selected and analyzed in light of Halliday and Matthiessen’s (1999) taxonomy of nominalization. Comparing the normalized freque...
متن کاملInvestigation and Statistical comparison of the soil empirical desalinization models for salin-sodic soils (Case study: Khuzestan province)
Accumulation of soluble salts in arid areas which are similar to most regions of Iran is inevitable in soil surface and profile because of low precipitation and high evaporation. High concentration of soluble salts in soil profile caused severe problems for root water uptake thus plant growth stopped. Reducing soil salinity to optimized content by leaching and avoiding soil pounding must be con...
متن کاملDomain mining for machine translation
Massive amounts of data for data mining consist of natural language data. A challenge in natural language is to translate the data into a particular language. Machine translation can do the translation automatically. However, the models trained on data from a domain tend to perform poorly for different domains. One way to resolve this issue is to train domain adaptation translation and language...
متن کاملHysteresis: Phenomenon and Modeling in Soil- Water Relationship
Hysteresis has been widely recognized in the soil water relationship. In this paper, a detailed review of hysteresis was performed in relation to its models. So far, different models have been suggested to describe hysteresis in the water retention curve (WRC) that could be categorized into two main groups: conceptual and empirical models. The models in the first group are based on the domain t...
متن کاملThe Investigation of the Perspectives of Iranian EFL Domain Experts on Postmethod Pedagogy: A Delphi Technique
After the introduction of postmethod pedagogy by Kumaravadivelu with its three principles of particularity, possibility and practicality, a wave of attention was directed towards this so-called 'postmethod era' and its appropriacy and adequacy in satiating the demands of the language learners in this 'brand new world'. This situation has created a healthy debate among the Iranian EFL community ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011